NSF PAR Search | NSF Public Access Repository

Vulnerability to Stability: Scalable Large Language Model in Queue-Based Web Service

Barek, Md Abdul; Rahid, Md Bajlur; Rahman, Md Mostafizur; Riad, ABM Kamrul; Francia, Guillermo; Shahriar, Hossain (July 2025, IEEE)

Large Language Models (LLMs) have demonstrated exceptional capabilities in the field of Artificial Intelligence (AI) and are now widely used in various applications globally. However, one of their major challenges is handling high-concurrency workloads, especially under extreme conditions. When too many requests are sent simultaneously, LLMs often become unresponsive which leads to performance degradation and reduced reliability in real-world applications. To address this issue, this paper proposes a queue-based system that separates request handling from direct execution. By implementing a distributed queue, requests are processed in a structured and controlled manner, preventing system overload and ensuring stable performance. This approach also allows for dynamic scalability, meaning additional resources can be allocated as needed to maintain efficiency. Our experimental results show that this method significantly improves resilience under heavy workloads which prevents resource exhaustion and enables linear scalability. The findings highlight the effectiveness of a queue-based web service in ensuring LLMs remain responsive even under extreme workloads.

Free, publicly-accessible full text available July 8, 2026

Search for: All records